Skip to main content

Build a Web Page Monitor with Google Sheets using ImportXML

monitor shopping websites

RSS feeds have completely changed the way we consume information on the web. You no longer have to visit The New York Times or CNN every hour of the day to check news headlines because the feed reader is doing that for you behind the scenes.

The only problem is that not all web content is available via feeds. For instance, Amazon, eBay and Google Product Search (Froggle) are good places to find discount deals on books and electronic gadgets but unfortunately, none of these shopping sites publish feeds.

Monitor HTML Web Pages with Google Docs

Problem: Let’s say you are looking for some discount deals on iPod Nano. An option here is that you open the the Google Shopping page and search for iPod. If you don’t find the right price, repeat the same cycle next day. This may sound easy but imagine doing this for ten other products on five different shopping sites. Tedious, right?

Solution: What we can do here is build a simple spreadsheet in Google Docs that will monitor prices across all these search pages and will present them in a table so you don’t just track prices but also compare them at the same time.

To get started, you need access to Google Docs and some basic knowledge of XPath. Don’t let this scare you - XPath is a simple way to access information contained inside HTML web pages. For instance, if you want to know about all URLs that are mentioned on any web page, the XPath expression would be //a[@href]. Some more examples:

//strong means all the items in the web page with strong html tags

//@href means all the items in the web page with href element, i.e., the URLs in that page.

If you feel that writing XPath expressions is a tricky job, get the XPath checker add-on for Firefox that will help you easily determine the XPath of any element on a web page.

Scrap Web Pages with Google Docs using ImportXML and XPath

This is the search page for ‘ipod nano’ inside Google Products. As you may noticed already, the result title is formatted with CSS class ps-large-t while the the product price using the class ps-larger-t - you can easily find these class names via Firebug or from the HTML source.

google-product-search

Now we’ll create a table inside Google spreadsheet that will have the name, price and URL that will link to that product listing in Google Docs. You can use the same approach to get product data from other sites like Amazon, eBay, Buy.com, etc.

Here’s how the final spreadsheet looks like - all this is live data and will update automatically if the corresponding information is updated on Google Products.


google-docs-sheet

Get External Data in Google Docs with ImportXML

As you may have seen in the previous tutorial on Google Docs, there’re built-in spreadsheet functions to help you easily import external data into Google Docs. One such useful function is ImportXML that, like ImportHTML, can be used for screen-scrapping.

The syntax is =ImportXML("web page URL", "XPath Expression")

Coming back to the spreadsheet, in order to fetch the price of ‘ipod nano’, we type the following formula:

=ImportXML("google.com/products?q=ipod+nano", "//b[@class='ps-larger-t']")

You may replace ‘ipod nano’ with any other product name like ‘harry+potter’, ‘nikon+d60’, etc.

To enter this function into Google Docs, click an empty cell, press F2 and paste. See this Google Docs movie:

google-docs-movie

Similarly, for the product name, we use this formula:

=ImportXML("www.google.com/products?q=ipod+nano", "//a[@class='ps-large-t']")

And for the URL (product hyperlink), the formula is:

=ImportXML("http://www.google.com/products?q=ipod+nano", "//a[@class='ps-large-t']//@href")

You need to concatenate this with http://www.google.com since Google Products uses relative URLs. This can be easily fixed by adding another column with the formula

=HYPERLINK("http://www.google.com/"&B3,"click here")

Related: Import data from HTML Web Pages into Excel

Subscribe to Web Page Changes via Feeds

web-page-feed

You don’t have to check this Google Docs Spreadsheet manually to see if prices have since yesterday - just select publish     followed by “Automatically re-publish when changes are made” and subscribe to the document in your favorite RSS reader.

The author is an Excel whiz kid and blogs at Chandoo.org. This site is a a gold mine of tips related to data manipulation & visualization through Excel and other spreadsheet programs.

Comments

Popular posts from this blog

How to Get the Quiz Score in Google Forms with Apps Script

Teachers can easily create an online quiz using Google Forms and students can view their test scores immediately after form submission. Teachers can use Google Forms to create an online quiz and students can view their test scores immediately after  form submission . With Apps Script, you can set up automatic  email notifications  and send quiz scores to parents after a student has taken the quiz. Here’s a sample Google Script that will iterate through every answer in the most recent Google Form response and log the max score (points) of a gradable question and the score for the respondent’s submitted answer. function getGoogleFormQuizScore ( ) { // Returns the form to which the script is container-bound. var form = FormApp . getActiveForm ( ) ; // Get the most recently submitted form response var response = form . getResponses ( ) . reverse ( ) [ 0 ] ; // Gets an array of all items in the form. var items = form . getItems ( ) ; for ( var...

Let People Quickly Save your Events on their Calendars

Create Add to Calendar links for emails and websites and let users quickly save your events on their own Google Calendar, Outlook or Yahoo Calendar. You are organizing an online event - maybe a meeting on Zoom or a training session hosted on Google Meet - and you would like the attendees to add the event to their own calendars. Once added to their calendar, the event will act as an automatic reminder and attendees will get a notification when the conference is about to start. There are two way to go about this: You can create a new meeting in your online calendar (Google, Outlook or any other calendar) and add the individual attendees as guests so the event automatically gets added to their calendar as well. You can include an “Add to Calendar” link or button in your email messages,  forms  and website pages. Anyone can click the link to quickly save your event on to their calendars - see  live demo . Create Add to Calendar Links for Emails and Websites The  Add to C...

How to Test your Eyes using the Computer

They say that you should get your eyes checked every two years but if haven’t had the chance to see a doctor all this time, you can test your vision on your computer as well. Of course these self eye tests are no substitute for visiting your doctor but if you follow the steps well, you may get some idea about how good (or bad) your vision is.  Test your Eyes Online with the Snellen Eye Chart The Snellen Eye Chart Most of us are familiar with the Snellen Chart that has rows of alphabets of different sizes – you read these letters from a distance, usually twenty feet, and the smallest row that you can recognize accurately indicates whether you have normal vision or not. The various eye testing tools that are available online make use of the same Snellen chart. Test your Eyesight Online You should start with University at Buffalo’s  IVAC tool . Use a physical ruler to measure the length of the line on the screen (the length will vary depending on your screen resolution). Also mea...